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Abstract. We illustrate a general technique for enumerating factors of 
fc-automatic sequences by proving a conjecture on the number f(n) of 
unbordered factors of the Thue-Morse sequence. We show that f(n) < n 
for n > 4 and that f(n) — n infinitely often. We also give examples of 
automatic sequences having exactly 2 unbordered factors of every length. 

1 Introduction 

In this paper, we are concerned with certain factors of fc-automatic sequences. 
Roughly speaking, a sequence x = a^a\a2 • ■ • over a finite alphabet A is said to 
be k- automatic if there exists a finite automaton that, on input n expressed in 
base k, reaches a state with output a n . Automatic sequences were popularized 
by a celebrated paper of Cobham [3] and have been widely studied; see pQ. 

More precisely, let k be an integer > 2, and set E^ — {0, 1, . . . , k — 1}. Let 
M = (Q, Ef., A, (5, <?o, t) be a deterministic finite automaton with output (DFAO) 
with transition function 8 : Q x E^ — > Q and output function r : Q —> A. Let 
(n)k denote the canonical base-fe representation of n, without leading zeros, 
and starting with the most significant digit. Then we say that M generates the 
sequence (a„)„> if a„ = r(S(q , (n) k )) for all n > 0. 

The prototypical example of a fc-automatic sequence is the Thue-Morse se- 
quence t = totxt2 ■ ■ ■ = 01101001 defined by the relations to = and 
t2n = t n , ^2n+i = 1 — t n for n > 0. It is generated by the DFAO below in 
Figure [1] 




Fig. 1. A finite automaton generating the Thue-Morse sequence t 



A factor of the sequence x is a finite word of the form aj • • • aj. A finite 
word w is said to be bordered if there is some finite nonempty word x =/= w that 
is both a prefix and a suffix of w [1211 1161^5] . For example, the English word 
ionization is bordered, as it begins and ends with ion. Otherwise w is said to 
be unbordered. 

Recently, there has been significant interest in the properties of unbordered 
factors; see, for example, [91815)10,. In particular, Currie and Saari [3] studied 
the unbordered factors of the Thue-Morse word. 

Currie and Saari [3] proved that if n ^ 1 (mod 6) , then the Thue-Morse word 
has an unbordered factor of length n, but left it open to decide for which lengths 
congruent to 1 (mod 6) this property holds. This was solved in [7], where the 
following characterization is given: 

Theorem 1. The Thue-Morse sequence t has an unbordered factor of length n 
if and only if (n) 2 £ 1(01*Q)*10*1. 

A harder problem is to come up with an expression for the number of unbor- 
dered factors of t. In [2J, the second author and co-authors made the following 
conjecture: 

Conjecture 1. Let fin) denote the number of unbordered factors of length n in 
t, the Thue-Morse sequence. Then / is given by /(0) = 1, /(l) = 2, f(2) = 2, 
and the system of recurrences 

/(4n + l) = /(2n + l) 

f(8n + 2) = f(2n + 1) - 8/(4n) + /(4n + 3) + 4/(8n) 

/(8n + 3) = 2/(2n) - f(2n + 1) + 5/(4n) + /(4n + 2) - 3/(8n) 

/(8n + 4) = -4/(4n) + 2/(4n + 2) + 2/(8n) 

/(8n + 6) = 2/(2n) - /(2n + 1) + /(4n) + /(4n + 2) + /(4n + 3) - /(8n) 
/(16n) = -2/(4n) + 3/(8n) (1) 
/(16n + 7) = -2/(2n) + /(2n + 1) - 5/(4n) + /(4n + 2) + 3/(8n) 
/(16n + 8) = -8/(4n) + 4/(4n + 2) + 4/(8n) 
/(16n + 15) = -8/(4n) + 2/(4n + 3) + 4/(8n) + f{8n + 7). 

for n > 0. 

This conjecture was obtained by computing a large number of values of / 
and then looking for possible linear relations among subsequences of the form 
(/(2*n + j))„> . 

This system suffices to calculate / efficiently, in 0(log n) arithmetic steps. 

We now summarize the rest of the paper. In Section^ we prove Conjecture[TJ 
In Section [31 we discuss how to obtain relations like those above for a given k- 
regular sequence. In Section 0] we discuss the growth rate of / in detail. Finally, 
in Section we give examples of other sequences with interesting numbers of 
unbordered factors. 



2 Proof of the conjecture 



We now outline our computational proof of Conjecture [TJ 

First, we need a little notation. We extend the notion of canonical base-fc 
representation of a single non-negative integer to tuples of such integers. For 
example, by (m, n)k we mean the unique word over the alphabet Sk x Ek such 
that the projection tti onto the first coordinate gives the base-fc representation 
of m, and the projection 1:2 onto the second co-ordinate gives the base-fc rep- 
resentation of n, where the shorter representation is padded with leading O's, 
if necessary, so that the representations have the same length. For example, 
(43, 17) 2 = [1,0] [0,1] [1,0] [0,1] [1,0] [1,1]. 



Proof. Step 1: Using the ideas in [7], we created an automaton A of 23 states 
that accepts the language L of all words (n,i)2 such that there is a "novel" 
unbordered factor of length n in t beginning at position i. Here "novel" means 
that this factor does not previously appear in any position to the left. Thus, 
the number of such words with first component equal to [n)% equals f(n), the 
number of unbordered factors of t of length n. This automaton is illustrated 
below in Figure [2] (rotated to fit the figure more clearly). 



Step 2: Using the ideas in [2], we now know that / is a 2-regular sequence, 
with a "linear representation" that can be deduced from the structure of A. 
This gives matrices Mo, Mi of dimension 23 and vectors v,w such that f(n) = 
vM ai ■ ■ ■ M ai w where ax ■ • ■ a, is the base-2 representation of n, written with the 
most significant digit first. They are given below. 
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Fig. 2. Automaton accepting (n, 1)2 such that there is a novel unbordered factor of 
length n at position i of t 
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v = [1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0] 
w = [0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] T 

Step 3: Now each of the identities in ([1]) corresponds to a certain identity in 
matrices. For example, the identity /(16n) = — 2/(4n) + 3/(8n) can be written 

as 

vMM M M M w = -2vMM M w + 3vMM M M w, (2) 

where M is the matrix product corresponding to the base-2 expansion of n. More 
generally, we can think of M as some arbitrary product of the matrices Mo and 
Mi, starting with at least one Mi; this corresponds to an arbitrary n > 1. We 
can think of M as a matrix of indeterminates. Then ((2]) represents an assertion 
about the entries of M which can be verified. Of course, the entries of M are not 
completely arbitrary, since they come about as Mi times some product of M 
and Mi. We can compute the (positive) transitive closure of Af + Mi and then 
multiply on the left by Mi; the entries that have 0's will be in any product 
of Mi times a product of the matrices Mq and Mi. Thus we can replace the 
corresponding indeterminates by 0, which makes verifying ([2]) easier. 

Another approach, which is even simpler, is to consider vM in place of M. 
This reduces the number of entries it is required to check from d 2 to d, where d 
is the dimension of the matrices. 

Step 4: Finally, we have to verify the identies for n — and n — 1, which is 
easy. 

We carried out this computation in Maple for the matrices Mq and Mi cor- 
responding to A, which completes the proof. The Maple program can be down- 
loaded from 



http : //www . cs .uwaterloo . ca/~shallit/papers .html 



3 Determining the relations 



The verification method of the previous section can be extended to a method 
to mechanically find the relations for any given fc-regular sequence g (instead of 
guessing them and verifying them), given the linear representation of g. 

Suppose we are given the linear representation of a fc-regular sequence g, that 
is, vectors v,w and matrices M , Mi, . . . , Mk-i such that g(n) = vM ai M a2 ■ ■ ■ M aj w, 
where a\a,2 ■ ■ ■ aj = (n)k- 

Now let M be arbitrary and consider vM as a vector with variable entries, say 
[ai, d2, . . . , Od]. Successively compute vMM y w for words y of length 0, 1, 2, . . . 
over Ek = {0,l,...,fc — 1}; this will give an expression in terms of the variables 
a±, . . . , ad- After at most d+l such relations, we find an expression for vMM y w 
for some y as a linear combination of previously computed expressions. When 
this happens, you no longer need to consider any expression having y as a suffix. 
Eventually the procedure halts, and this corresponds to a system of equations 
like that in (P). 

Consider the following example. Let k — 2, v — [6, 1], w = [2, 4] T , and 



M = 
Mi = 



-3 1 
1 4 

2 
-3 1 



Suppose M is some product of M Q and M 1; and suppose vM = [a, b]. 
We find 



vMw = 2a + 4b 
vMM w = -2a + 186 
vMM x w = -8a - 2b 
vMM M w = 24a + 706 
vMMxMqw = 36a + 246 

and, solving the linear systems, we get 



This gives us 



vMMiw 
vMM M w 
vMMxMqw 

g(2n+l) 
g(4n) 
g(4n + 2) 



35 „ 9 „ r 
— vMw — — vMqw 

13vMw + vMqw 
174 , r 24 

VMW VA'lnW. 

11 11 



35 , . 9 .„ . 
13g(n) + ff(2n) 



174 
IT 



24 



for n > 1. 



4 The growth rate of /(n) 

We now return to f(n), the number of unbordered factors of t of length n. Here 
is a brief table of f(n): 



n 





1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


14 


15 


16 


17 


18 


19 


20 


21 


22 


23 


24 


25 


26 


27 


2* 


29 


30 


31 


fin) 


1 


2 


2 


4 


2 


4 


6 





4 


4 


4 


4 


12 





4 


4 


8 


4 


8 





8 


4 


4 


8 


24 





4 


4 


8 


4 


8 


4 



Kalle Saari (personal communication) asked about the growth rate of f(n). 
The following results characterizes it. 

Theorem 2. We have f(n) < n for n > 4. Furthermore, f(n) = n infinitely 
often. Thus, limsup n>1 f(n)/n = 1. 

Proof. We start by verifying the following relations: 



/(4n) = 2/(2n), (n > 2) (3) 

/(4n + l)=/(2n+l), (n > 0) (4) 

/(8n + 2) = /(2n + l) + /(4n + 3), (n > 1) (5) 

/(8n + 3) = -/(2n+l) + /(4n + 2) (n > 2) (6) 

/(8n + 6) = -/(2n+l) + /(4n + 2)+/(4n + 3) (n > 2) (7) 

/(8n+7) = 2/(2n+l) + /(4n + 3) (n > 3) (8) 



These can be verified in exactly the same way that we verified the system 
(p} earlier. 

We now verify, by induction on n, that f(n) < n for n > 4. The base case is 
n = 4, and /(4) = 2. Now assume n > 5. Otherwise, 

— If n = (mod 4), say n = 4m and m > 2. Then /(4m) = 2/(2m) < 2 ■ 2m < 
4m by ([3]) and induction. 

— If n = 1 (mod 4), say n = 4m + 1 for m > 1, then /(4m + 1) = /(2m + 1) 
by ((4|. But /(2m + 1) < 2m + 1 by induction for m > 2. The case m = 1 
corresponds to /(5) = 4 < 5. 

— If n = 2 (mod 8), say n = 8m + 2, then for m > 2 we have /(8m + 2) = 
/(2m + 1) + /(4m + 3) < 6m + 4 by induction, which is less than 8m + 2. If 
m = 1, then /(10) = 4 < 10. 

— If n = 3 (mod 8), say n = 8m + 3 for m > 1, then /(8m + 3) = —/(2m + 

1) + /(4m + 2) < /(4m + 2) < 4m + 2 by induction. 

— If n = 6 (mod 8), say n = 8m + 6, then /(8m + 6) = -/(2m + 1) + /(4m + 

2) + /(4m + 3) < /(4m + 2) + /(4m + 3) < 8m + 5 by induction, provided 
m > 2. For m = we have /(6) = 6 and for m = 1 we have /(14) = 4. 



- If n = 7 (mod 8), say n = 8m + 7, then /(8m + 7) = 2/(2m+l) + /(4m + 3) < 
2(2m + 1) + 4m + 3 = 8m + 5 for m > 3, by induction. The cases m = 0, 1, 2 
can be verified by inspection. 

This completes the proof that f(n) < n. 

It remains to see that f(n) = n infinitely often. We do this by showing that 
f(n) = n for n of the form 3-2', i > 1. Let us prove this by induction on i. 
It is true for i = 1 since /(6) = 6. Otherwise i > 2, and using ([3]) we have 
/(3 • 2 l+1 ) = 2/(3 • 2 l ) = 2 • 3 • 2 4 = 3 • 2 l+1 by induction. This also implies the 
claim limsup n>1 f(n)/n = 1. 

5 Unbordered factors of other sequences 

We can carry out similar computations for other famous sequences. In some cases 
the automata and the corresponding matrices are very large, which renders the 
computations time-consuming and the asymptotic behavior less transparent. We 
report on some of these computations, omitting the details. 

Theorem 3. Let r = roriXa • • ■ = 00010010 •• • denote the Rudin- Shapiro se- 
quence, defined by r n — the number of occurrences, taken modulo 2, of '11' in 
the binary expansion of n. Let f r (n) denote the number of unbordered factors of 
length n in r. Then f r (n) < ^-n for all n > 1. Furthermore if n — 2 l + 1, then 
f{n) = 21 • 2^ 3 for i > 4. 

Theorem 4. Let p = p^piVi ■■■ = 0100- • • be the so-called "period- doubling" 
sequence, defined by 



where t^t^ ■ ■ ■ is the Thue-Morse word t. Note that p is the fixed point of the 
morphism — > 01 and 1 — > 00. Then f p (n), the number of unbordered factors of 
p of length n, is equal to 2 for all n > 1. 

The period-doubling sequence can be generalized to base k > 2, as follows: 



where Vk{x) is the exponent of the largest power of k dividing x. For each k, the 
corresponding sequence is a binary sequence that is fc-automatic: 

Theorem 5. Let k be an integer > 2. The sequence p^ is the fixed point of the 
morphism ip^, where 




Pfc := {vk{n + 1) mod 2)„> , 



Vk(o) = o fc - 1 i 



Proof. Note that Pfe(n) = c iff v k (n + 1) = 2 j + c for some integer j > 0, and 
ce {0,1}. 

If < a < fc — 1, then p k (kn + a) = v k {kn + a + \) mod 2 = 0. If a = k — 1 we 
have pfc(fcn + a) = v k {kn + k) mod 2 = i/fc(fe(ri+ 1)) mod 2 = (2j + c+l) mod 2. 
Hence if Pfc(n) = 0, then p/j[fcn..fcn + fc — 1] = fe_1 1, while if Pfc(n) = 1, then 
p/;[fcn..fcn + fc — 1] = fe . It follows that p& is the fixed point of ip k . 

The generalized sequence pfc has the same property of unbordered factors as 
the period-doubling sequence: 

Theorem 6. The number of unbordered factors ofp k of length n, for k > 2 and 
n > 1, is equal to 2, and the two unbordered factors are reversals of each other. 

We begin with some useful lemmas. 

Lemma 1. Let x e {0, 1}* be a word. Then O^ 1 <p k {x) R = ip k {x H )0 k - 1 . 

Proof. Suppose x = aia 2 ■ ■ ■ a n , where each aj G {0, 1}. If a G {0, 1}, let a denote 
1 - a. Then 



o fe -Vfe(^) K = o 




k-1 



= Mx R )o k 

Lemma 2. If the word w is bordered, then tp k (w) is bordered. 

Proof. If w is bordered, then w — xyx for x ^ e. Then (p k (w) = (p k (x)ip k (y)ip k (x) 
is bordered. 

Lemma 3. If w is a factor ofp k , then so is w R . 

Proof. If w is a factor of p k , then it is a factor of some prefix pfe[0..fc 4 — 1] for 
some i > 1. So it suffices to show that p k [0..k l — 1} R appears as a factor of pfc. 
In fact, we claim that 

p k [Q..k l - 1} R = p k [k l - l..2k l - 2}. 
To see this, it suffices to observe that v k {k l — a) = u k (k l + a) for < a < k l . 



The following lemma describes the unbordered factors of tfk- It w = a x, 
then by 0~ a w we mean the word x. 

Lemma 4. (a) If w is an unbordered factor of pk and \w\ = (mod k), then 
w = (fik(x) or w — tfk{x) R , for some unbordered factor x of Pk with \x\ = 
\w\/k. 

(b) If w is an unbordered factor of pk and \w\ = a (mod k) for < a < k, then 
w = Q a ~ k ipk(x) or w — ifik(x) R a ~ k , for some unbordered factor x of pk 
with \x\ — (\w\ — a)/k + 1. 

Proof, (a): Suppose that w = pk[i--i + kn — 1] for some integer i. There are two 
cases to consider: pk[i] = and Pk[i] = 1. 

Suppose Pk[i] — 0. Since w is unbordered, we have pk[i + kn — 1] = 1. Then 
Vk(i + kn) > 1, so i + kn = km for some m > 0. Then i = k(m — n) is a multiple 
of k, so w — (fik(x), where x = pk[i/k..i/k + n—l]. Note that \x\ = \w\/k. Finally, 
Lemma [5] shows that x is unbordered. 

Suppose Pk[i\ — 1- Since w is unbordered, we have pk[i + kn — 1] = 0. From 
Lemma [3] we know that w R is also a factor of (and also is unbordered). Then 
from the previous paragraph, we see that w R — (fk{x) for some unbordered 
factor x of pfc, with \x\ — \w\/k. Then w — (fk(x) R , as desired. 

(b): Suppose that w = pk[i--i + kn + a — 1] for < a < k. There are two 
cases to consider: Pk[i] = and Pk[i] = L 

Suppose that pk[i] — 0. Since w is unbordered, we know that Pk[i + kn + 
a — 1] = 1. Then Vkii + kn + a) > 1, so i + kn + a — km for some m > 0. Then 
i — (k — a) — k(m — n — 1) is a multiple of k. Hence 

fc ~ a w = pk[i — (k — a)..i + kn+a- 1] = <pk{pk[(i + a)/k- l..(i + a)/k + n-l]). 

Let x — Pk[(i + a)/k — + a)/k + n — 1]. Then w = a ^ k ipk{x), and \x\ = 
{\w\ — a)/k + 1. If x is bordered, then using Lemma[2]we have that k ~ a w has 
a border of length > k, so w has a border of length at least a, a contradiction. 

Suppose that Pk[i] — L Since w is unbordered, we know that pk[i + kn + 
a — 1] = 0. Then by Lemma [3] we know that w R is also an unbordered factor of 
Pk- Then from the previous paragraph, we get that w R — a ~ k (p k (x) for some 
unbordered factor x of pk where |x| = (\w\ — a)/k + 1. So w — ipk(x) R Q a ~ k , as 
desired. 

Lemma 5. Let x be a word and w = 1x0 be an unbordered word. Then l ifk{x0) 
is unbordered for 1 < i < k. 

Proof. If i — k then k ipk(x0) = ifk(lx0) = ipk{w). Suppose <Pk{w) is bordered; 
then there exist and v such that (fik(w) — uvu. Since (fk(0) = O^ -1 !, we 

know u ends in 1. But since it is a prehx of ifk{w) that ends in 1, it follows that 
|u| = (mod k), and so u is the image of some word r under (pk- Hence w begins 
and ends with r, a contradiction. 

Now assume 1 < i < k and l <^fc(:r0) is bordered. Then there exist u =fi e and 
v such that l (fik(x0) = uvu; note that u must end in 1. It follows that 

<Pk(v>) = = k Vk(x0) = fc - J (0Vfc(z0)) = k - l uvu. 



Since O k ~ l u and O k ~ t uvu both end in 1 and O k ~ l uvu — cpk(w), we have \vu\ = 
(mod k). Hence |u| = i (mod k). It follows that O k ~ l uv ends in fe , so Q k ~ l uvu = 
(fik(w) begins and ends in *u, a contradiction. 

We are now ready for the proof of Theorem [6] 

Proof. First, we show that there is at least one unbordered factor of every length, 
by induction on n. The base cases are n < 2k, and are left to the reader. Oth- 
erwise n > 2k. Write n — kn' + i where 1 < i < k. By induction there is an 
unbordered word w of length n' + 1. Using Lemma [31 we can assume that w 
begins with 1 and ends with 0, say w = 1x0. By Lemma[5]we have that l (ph(x0) 
is unbordered, and it is of length i + kn' = n. 

It remains to prove there are exactly 2 unbordered factors of every length. 

If n < 2k, then it is easy to see that the only unbordered factors are 1 n_1 
and 0"- 1 1. 

Now assume n > 2k and that there are only two unbordered factors of length 
n' for all n' < n; we prove it for n. Let w be an unbordered factor of length n. 

If rt = (mod k), then by Lemma 0] (a), we know that either w = <j>k(x) or 
w = (f>k{x) R , where x is an unbordered factor of length n/k. By induction there 
are exactly 2 unbordered factors of length n/k; by Lemma [3] they are reverses 
of each other. Let x be such an unbordered factor; since \x\ = n/k > 2, either x 
begins with and ends with 1, or begins with 1 and ends with 0. In the former 
case, the image w = tpk{x) begins and ends with 0, a contradiction. So x begins 
with 1 and ends with 0. But there is only one such factor, so there are only two 
possibilities for w. 

Otherwise let a = n mod k; then < a < k. By Lemma U (b), we know that 
w = a ~ k ip k (x) or w = ifk(x) R a ~ k , where x is an unbordered factor of length 
(\w\ — a)/k + 1 > 2. By induction there are exactly 2 such unbordered words; by 
Lemma [3J they are reverses of each other. Let x be such an unbordered factor; 
then either x begins with and ends with 1, or begins with 1 and ends with 0. 
Let us call them Xo and x%, respectively, with xq = x R . Now (fik(%o) begins with 
fe_1 1, and ends with fc . Hence, provided a ^ 1, we see that w = a_fc (fik(xo) 
begins with and ends with 0, a contradiction. If a = 1, Lemma 2] (b) gives the 
two factors 1_fe ipk(xo) and <fk(xo) R 1_fc . The former begins with 1 and ends 
with 0; the latter begins with and ends with 1. 

In the latter case, x% begins with 1 and ends with 0. There is only one such 
xi (by induction), and then either w = a ~ fc (fk(xi) or w — ip k (xx) R a ~ fc , giving 
at most two possibilities for w. In the case a = 1, these two factors would seem 
to give a total of four factors of length n. However, there are only two, since 

1 " fc Vk (x ) - O 1 ^ Vk(x R ) = Mxi) R 
Vk(xo) R = y k {x R ) = O 1 -* Mxi) 



This completes the proof. 
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